Skip to content

Conversation

@amyfromandi
Copy link
Collaborator

@amyfromandi amyfromandi commented Jan 14, 2026

Object Storage API Summary

Tied to #8

This API provides a database-backed, S3-compatible object registry for managing uploaded files used by ingestion pipelines and downstream services. The database is treated as the source of truth, while MinIO/S3 is used as the physical storage layer.

The API is fully stateless, does not rely on ORM models or schemas, and operates using explicit SQL queries and direct MinIO operations.


GET /object

List objects from storage.object

  • Returns paginated results from storage.object
  • Filters:
    • slug → prefix match on object key (slug/%)
    • include_deleted → include soft-deleted rows
  • Sorted by most recent first
  • Database is the source of truth
  • No MinIO access required

GET /object/{id}

Fetch a single object by ID

  • Returns full object metadata including:
    • bucket, key, sha256, mime type, timestamps, deleted state
  • Returns 404 if object does not exist
  • Database only

POST /object

Upload and register new objects

  • Accepts multipart/form-data under field name object
  • Supports multiple files per request
  • For each file:
    • Computes SHA256 while streaming
    • Uploads to MinIO bucket
    • Registers metadata in storage.object
    • Skips insert if identical key already exists
  • Returns list of created objects
  • Requires authenticated access

May want to remove this functionality since this would allow the db data to diverge from what's in the s3 bucket.

PATCH /object/{id}

Update object metadata in database only

  • Supported fields:
    • key
    • mime_type
    • source
  • Does not rename objects in MinIO
  • Updates timestamp automatically
  • Returns updated object row

DELETE /object/{id}?hard=true|false

Delete object

Soft delete (hard=false)

  • Marks deleted_on in database
  • Keeps object in MinIO

Hard delete (hard=true, default)

  • Deletes object from MinIO bucket using row’s bucket/key
  • Deletes row from storage.object
  • If MinIO deletion fails, database row is preserved

This guarantees consistency between physical storage and database state.


Design Principles

  • Database is always the source of truth
  • MinIO is treated as an external storage backend
  • No ORM or schema dependencies
  • All queries are explicit SQL
  • Supports multi-bucket future expansion
  • Safe deletion ordering prevents orphaned storage or broken references
  • Fully compatible with ingestion pipelines and CLI tooling

@davenquinn
Copy link
Member

davenquinn commented Jan 15, 2026

I like it!

  • The slug prefix matching will be useful.
  • Are the slugs modifiable in the metadata table? I could see some value in that, as it would allow file "renaming" (at least from the user perspective)
  • A thing that I could see being valuable is a /track and /forget functionality that allows files that are already in S3 to be checked in or out of management. This would allow us to include management of files that are managed through other means (e.g., giant/multipart uploads)

We should probably have a helper route that redirects to an object's direct URL (or mints a signed URL) so that we can quickly view/download files.

A small definitional quibble - The Minio client is used to implement the S3 API in Python. But fundamentally the storage backend can be Minio, Ceph, AWS, or any other S3 compatible object store.

Copy link
Member

@davenquinn davenquinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks pretty good but ideally some of the naming would be tightened (esp. for env variables).

There can be elaboration for new features (e.g., multiple buckets, tracking "unowned" files) but that isn't necessary right now.

@amyfromandi amyfromandi merged commit 020942b into main Jan 16, 2026
@amyfromandi amyfromandi deleted the files_v3 branch January 16, 2026 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants